Add Python SDK with integration tests#6
Conversation
Scaffolds packages/sdk-python/ with: - Pydantic models generated from live registry OpenAPI spec (https://registry.mpak.dev/docs/json via datamodel-code-generator) - scripts/generate-types.py to regenerate types from spec - MpakClient with get_bundle_download, load_bundle, load_bundle_from_url - mpak-loader CLI (drop-in replacement for mcpb-loader.py) - Error hierarchy matching TypeScript SDK (MpakError, MpakNotFoundError, MpakIntegrityError, MpakNetworkError) - Platform detection and SHA256 fail-closed integrity verification - 12 unit tests passing, ruff lint clean - Dependencies: httpx, pydantic (no email-validator)
Fix two bugs that prevented the SDK from working against the live registry: add Accept: application/json header so the download endpoint returns JSON metadata instead of a 302 redirect, and correct the architecture mapping from amd64 to x64 to match the registry convention. Add 8 integration tests (marked @pytest.mark.integration) that hit the live registry.mpak.dev to verify resolve, download, SHA256 verification, and extraction work end-to-end.
|
Requesting review from @shwetank-dev and @JoeCardoso13 — please accept the repo invite first, then take a look. |
JoeCardoso13
left a comment
There was a problem hiding this comment.
Code Review
Thanks for this — the SDK is well-structured overall: clean error hierarchy, good use of httpx, sensible config, and solid integration tests. That said, there are a few issues that need addressing before merge, including one security concern and a bug.
BUG-1 — NameError in loader.py if MpakClient() raises
File: packages/sdk-python/src/mpak/loader.py
client is assigned inside the try block. If MpakClient() itself raises, client is never bound. The except Exception handler then calls sys.exit(1), which raises SystemExit, triggering the finally block — which calls client.close() and throws NameError on the unbound name.
try:
client = MpakClient() # if THIS raises...
...
except Exception as e:
sys.exit(1)
finally:
client.close() # NameError — client is unboundFix: Initialize client = None before the try and guard the close with if client: client.close().
SECURITY-1 — Zip slip vulnerability in client.py:load_bundle_from_url
File: packages/sdk-python/src/mpak/client.py
zipfile.ZipFile.extractall() does not sanitize entry paths. A zip containing entries like ../../etc/cron.d/evil will write files outside of dest_path. SHA256 verification confirms the zip hasn't been tampered with in transit, but does not protect against a malicious zip served from a compromised registry or CDN with a valid pre-computed hash.
with zipfile.ZipFile(bundle_path, "r") as zf:
zf.extractall(dest_path) # no path sanitizationFix: Validate each entry path before extracting:
resolved_dest = dest_path.resolve()
for member in zf.namelist():
if not (resolved_dest / member).resolve().is_relative_to(resolved_dest):
raise ValueError(f"Zip slip attempt detected: {member}")
zf.extractall(dest_path)DEAD-CODE-1 — Unreachable 404 branch in client.py:get_bundle_download
File: packages/sdk-python/src/mpak/client.py
The explicit if response.status_code == 404 check fires before response.raise_for_status(), so a 404 raises MpakNotFoundError and exits the try block immediately. The except httpx.HTTPStatusError handler's own 404 branch is therefore unreachable dead code.
if response.status_code == 404:
raise MpakNotFoundError(...) # caught here, exits try
response.raise_for_status()
...
except httpx.HTTPStatusError as e:
if e.response.status_code == 404: # never reached
raise MpakNotFoundError(...) from eOne of the two should be removed.
DESIGN-1 — Bundle download bypasses the configured client
File: packages/sdk-python/src/mpak/client.py
Registry API calls use self._client (configured with User-Agent, Accept, follow_redirects). The actual bundle download creates a raw ad-hoc stream, silently losing those headers:
with httpx.stream("GET", url, follow_redirects=True, timeout=self.config.timeout) as response:Fix: Use self._client.stream(...) for consistency.
DESIGN-2 — Version "0.1.0" hardcoded in two places independently
Files: src/mpak/__init__.py, src/mpak/types.py
__version__ in __init__.py and the user_agent default in MpakClientConfig both hardcode "0.1.0" independently of pyproject.toml. When the version is bumped, the user agent will silently report the wrong version.
Fix: Derive it at runtime: importlib.metadata.version("mpak").
Minor issues
verify_integrity_bytesin_integrity.py— defined and exported but never called anywhere. Dead code or planned for future use?mock_registry_urlfixture intests/conftest.py— defined but never used in any test file.PRODUCT.mdmonorepo layout — showssdk/(old name) andpython-sdk/(wrong name). Should besdk-typescript/andsdk-python/to match actual directories.pytest-asyncioin dev deps — listed as a dependency andasyncio_mode = "auto"is configured, but no async tests exist.generate-types.pystring replacement —content.replace(" EmailStr,\n", "")hardcodes a 4-space indent assumption; will silently fail if the code generator changes its output format.generate-types.pydouble fetch — the OpenAPI spec is fetched once viaurllib.request.urlopen(to print byte count) and then again bydatamodel-codegen --url. The first fetch is redundant.
- BUG-1: Fix NameError in loader.py when MpakClient() raises - SECURITY-1: Add zip slip protection in load_bundle_from_url - DEAD-CODE-1: Remove unreachable 404 branch in get_bundle_download - DESIGN-1: Use self._client.stream() for bundle downloads - DESIGN-2: Derive version from importlib.metadata - Remove dead code (verify_integrity_bytes, unused fixture, pytest-asyncio) - Fix PRODUCT.md and CLAUDE.md monorepo layout (sdk/ -> sdk-typescript/) - Fix generate-types.py redundant fetch - Add ty type checker to dev deps - Add sdk-python-ci.yml (ruff, ty, pytest on PR/push) - Add sdk-python-publish.yml (trusted publisher, publish on tag) - Add 8 unit tests covering all fixed bugs (12 -> 20 total)
CI status, PyPI version, Python versions, license, and mpak.dev badge.
JoeCardoso13
left a comment
There was a problem hiding this comment.
Re-review
All findings addressed except one minor item. Solid work — the fixes are correct and well-tested.
Status
| ID | Status |
|---|---|
| BUG-1 | Fixed, with tests (test_loader.py) |
| SECURITY-1 | Fixed, with tests (zip slip rejection + cleanup) |
| DEAD-CODE-1 | Fixed, with tests (500 → MpakError not MpakNotFoundError) |
| DESIGN-1 | Fixed, with tests (User-Agent header verified on download) |
| DESIGN-2 | Fixed (importlib.metadata in both __init__.py and types.py) |
| MINOR-1 | Fixed (dead code removed) |
| MINOR-2 | Fixed (unused fixture removed) |
| MINOR-3 | Fixed (PRODUCT.md paths corrected) |
| MINOR-4 | Fixed (pytest-asyncio removed) |
| MINOR-5 | Not addressed — see below |
| MINOR-6 | Fixed (redundant fetch removed) |
MINOR-5 — Brittle EmailStr replacement in generate-types.py (still open)
The hardcoded 4-space indent assumption in the post-processing step is still there:
content = content.replace(" EmailStr,\n", "").replace("EmailStr", "str")This will silently fail to remove the EmailStr import if datamodel-codegen ever changes its output formatting. Low risk since the generated file is checked in and would be caught on visual inspection, but worth a regex or AST-based replacement at some point.
Summary
mpakPython SDK (packages/sdk-python/) for searching, resolving, and downloading MCPB bundles from the mpak registryAccept: application/jsonheader soget_bundle_download()receives JSON metadata instead of following a 302 redirect to binaryx86_64now maps tox64(matching Node.js/registry convention) instead ofamd64@pytest.mark.integration) that hit the live registry to verify resolve, download, SHA256 verification, and extraction end-to-endTest plan
pytest -m "not integration"— 12 unit tests passpytest -m integration— 8 integration tests pass against live registry.mpak.devruff check .— cleanruff format --check .— clean